NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

LncRNA Subcellular Localization Across Diverse Cell Lines: An Exploration Using Deep Learning with Inexact q-mers

https://doi.org/10.3390/ncrna11040049

Yi, Weijun; Miller, Jason R; Hu, Gangqing; Adjeroh, Donald A (August 2025, Non-Coding RNA)

Background: Long non-coding Ribonucleic Acids (lncRNAs) can be localized to different cellular compartments, such as the nuclear and the cytoplasmic regions. Their biological functions are influenced by the region of the cell where they are located. Compared to the vast number of lncRNAs, only a relatively small proportion have annotations regarding their subcellular localization. It would be helpful if those few annotated lncRNAs could be leveraged to develop predictive models for localization of other lncRNAs. Methods: Conventional computational methods use q-mer profiles from lncRNA sequences and train machine learning models such as support vector machines and logistic regression with the profiles. These methods focus on the exact q-mer. Given possible sequence mutations and other uncertainties in genomic sequences and their role in biological function, a consideration of these variabilities might improve our ability to model lncRNAs and their localization. Thus, we build on inexact q-mers and use machine learning/deep learning techniques to study three specific problems in lncRNA subcellular localization, namely, prediction of lncRNA localization using inexact q-mers, the issue of whether lncRNA localization is cell-type-specific, and the notion of switching (lncRNA) genes. Results: We performed our analysis using data on lncRNA localization across 15 cell lines. Our results showed that using inexact q-mers (with q = 6) can improve the lncRNA localization prediction performance compared to using exact q-mers. Further, we showed that lncRNA localization, in general, is not cell-line-specific. We also identified a category of LncRNAs which switch cellular compartments between different cell lines (we call them switching lncRNAs). These switching lncRNAs complicate the problem of predicting lncRNA localization using machine learning models, showing that lncRNA localization is still a major challenge.
more » « less
Free, publicly-accessible full text available August 1, 2026
SSL-SurvFormer: A Self-Supervised Learning and Continuously Monotonic Transformer Network for Missing Values in Survival Analysis

https://doi.org/10.3390/informatics12010032

Le, Quang-Hung; Patel, Brijesh; Adjeroh, Donald; Doretto, Gianfranco; Le, Ngan (March 2025, Informatics)

Survival analysis is a crucial statistical technique used to estimate the anticipated duration until a specific event occurs. However, current methods often involve discretizing the time scale and struggle with managing absent features within the data. This becomes especially pertinent since events can transpire at any given point, rendering event analysis a continuous concern. Additionally, the presence of missing attributes within tabular data is widespread. By leveraging recent developments of Transformer and Self-Supervised Learning (SSL), we introduce SSL-SurvFormer. This entails a continuously monotonic Transformer network, empowered by SSL pre-training, that is designed to address the challenges presented by continuous events and absent features in survival prediction. Our proposed continuously monotonic Transformer model facilitates accurate estimation of survival probabilities, thereby bypassing the need for temporal discretization. Additionally, our SSL pre-training strategy incorporates data transformation to adeptly manage missing information. The SSL pre-training encompasses two tasks: mask prediction, which identifies positions of absent features, and reconstruction, which endeavors to recover absent elements based on observed ones. Our empirical evaluations conducted across a variety of datasets, including FLCHAIN, METABRIC, and SUPPORT, consistently highlight the superior performance of SSL-SurvFormer in comparison to existing methods. Additionally, SSL-SurvFormer demonstrates effectiveness in handling missing values, a critical aspect often encountered in real-world datasets.
more » « less
Free, publicly-accessible full text available March 1, 2026
A Framework for Evaluating Model Trustworthiness in Classification of Very High Resolution Histopathology Images

https://doi.org/10.1109/BIBM62325.2024.10822778

Nouyed, Mohammad Iqbal; Doretto, Gianfranco; Adjeroh, Donald A (December 2024, IEEE)

Full Text Available
A Machine Learning Approach to Motif Finding for LncRNA Sub-cellular Localization

https://doi.org/10.1109/BIBM62325.2024.10822208

Yi, Weijun; Miller, Jason R; Hu, Gangqing; Adjeroh, Donald A (December 2024, IEEE)

Full Text Available
Evaluation of machine learning models that predict lncRNA subcellular localization

https://doi.org/10.1093/nargab/lqae125

Miller, Jason R.; Yi, Weijun; Adjeroh, Donald A. (September 2024, NAR Genomics and Bioinformatics)

Abstract The lncATLAS database quantifies the relative cytoplasmic versus nuclear abundance of long non-coding RNAs (lncRNAs) observed in 15 human cell lines. The literature describes several machine learning models trained and evaluated on these and similar datasets. These reports showed moderate performance, e.g. 72–74% accuracy, on test subsets of the data withheld from training. In all these reports, the datasets were filtered to include genes with extreme values while excluding genes with values in the middle range and the filters were applied prior to partitioning the data into training and testing subsets. Using several models and lncATLAS data, we show that this ‘middle exclusion’ protocol boosts performance metrics without boosting model performance on unfiltered test data. We show that various models achieve only about 60% accuracy when evaluated on unfiltered lncRNA data. We suggest that the problem of predicting lncRNA subcellular localization from nucleotide sequences is more challenging than currently perceived. We provide a basic model and evaluation procedure as a benchmark for future studies of this problem.
more » « less
ItpCtrl-AI: End-to-end interpretable and controllable artificial intelligence by modeling radiologists’ intentions

Pham, Trong-Thang; Brecheisen, Jacob; Wu, Carol C; Nguyen, Hien; Deng, Zhigang; Adjeroh, Donald; Doretto, Gianfranco; Choudhary, Arabinda; Le, Ngan (December 2024, Artificial intelligence in medicine)

Full Text Available
Retain and Adapt: Online Sequential EEG Classification With Subject Shift

https://doi.org/10.1109/TAI.2024.3385390

Duan, Tiehang; Wang, Zhenyi; Shen, Li; Doretto, Gianfranco; Adjeroh, Donald A; Li, Fang; Tao, Cui (September 2024, IEEE Transactions on Artificial Intelligence)

Full Text Available
FG-CXR: A Radiologist-Aligned Gaze Dataset for Enhancing Interpretability in Chest X-Ray Report Generation

https://doi.org/10.1007/978-981-96-0960-4_5

Pham, Trong Thang; Ho, Ngoc-Vuong; Bui, Nhat-Tan; Phan, Thinh; Brijesh, Patel; Adjeroh, Donald; Doretto, Gianfranco; Nguyen, Anh; Wu, Carol C; Nguyen, Hien; et al (December 2024, Springer Nature Singapore)

Full Text Available
Online continual decoding of streaming EEG signal with a balanced and informative memory buffer

https://doi.org/10.1016/j.neunet.2024.106338

Duan, Tiehang; Wang, Zhenyi; Li, Fang; Doretto, Gianfranco; Adjeroh, Donald A; Yin, Yiyi; Tao, Cui (August 2024, Neural Networks)

Full Text Available
TSRNET: Simple Framework for Real-Time ECG Anomaly Detection with Multimodal Time and Spectrogram Restoration Network

https://doi.org/10.1109/ISBI56570.2024.10635676

Bui, Nhat-Tan; Hoang, Dinh-Hieu; Phan, Thinh; Tran, Minh-Triet; Patel, Brijesh; Adjeroh, Donald; Le, Ngan (May 2024, IEEE)

Full Text Available

« Prev Next »

Search for: All records